feat: add LangWatch as optional observability backend alongside Langfuse#10
feat: add LangWatch as optional observability backend alongside Langfuse#10tusharjadhav3302 wants to merge 3 commits intoforge-sdlc:mainfrom
Conversation
gryf
left a comment
There was a problem hiding this comment.
Small issue with imported module. Other than that - seems right to me.
Made-with: Cursor
|
Thanks for the PR and the write-up but I do have some concerns before we can move forward. On the motivation: A few of the features presented as LangWatch-exclusive actually exist in Langfuse today. For example, Langfuse supports live evaluation and LLM-as-a-judge methods (https://langfuse.com/docs/evaluation/evaluation-methods/llm-as-a-judge), which cover a meaningful part of the evaluation story. Beyond that, Forge's general evaluation philosophy is real-world evaluation - does the code pass CI, does the PR get merged so some of the evaluators mentioned (BLEU, ROUGE) aren't directly relevant to what we're trying to measure. Before we add a second tracing backend, I'd like to see a precise list of: (a) which specific Langfuse capabilities are genuinely missing, (b) how you plan to use the LangWatch equivalents in practice, and (c) how they connect to Forge's actual quality goals. On technical gaps: From an initial review, there are a few things missing from this PR:
On the broader principle: Adding a second tracing system is a meaningful decision. It adds operational complexity, another service to run, another set of credentials to manage, and another surface to maintain. That cost needs to be justified by a clear capability gap not just "LangWatch has more features." Having additional features doesn't mean we need them or have a good enough reason to use them. My strong preference is to stay with a single tracing system unless there's a concrete, specific capability that Langfuse provably cannot deliver for our use case. As things stand I'm inclined to reject this on the grounds raised above the PR adds meaningful code complexity without a sufficiently clear reason. If you can provide a complete breakdown of the missing Langfuse capabilities, how you intend to use the LangWatch equivalents, and how they support Forge's tracing and evaluation goals, I'm happy to revisit. |
Summary
Adds LangWatch as an optional, parallel observability backend alongside Langfuse.
Full proposal with motivation, design, alternatives, and risks:
proposals/009-langwatch-integration.md
Changes
src/forge/integrations/langwatch/module (2 files, 143 lines)LANGWATCH_ENABLED,LANGWATCH_API_KEY,LANGWATCH_ENDPOINT.env.exampleand developer guide with LangWatch setup docsNo Breaking Changes
LANGWATCH_ENABLED=false)Test Plan
uv run forge-servestarts cleanly withLANGWATCH_ENABLED=falseuv run forge workerstarts cleanly withLANGWATCH_ENABLED=falseuv run pytest tests/unit/passes